30 research outputs found

    Genome wide prediction of HNF4α functional binding sites by the use of local and global sequence context

    Get PDF
    An application of machine learning algorithms enables prediction of the functional context of transcription factor binding sites in the human genome

    FeatureScan: revealing property-dependent similarity of nucleotide sequences

    Get PDF
    FeatureScan is a software package aiming to reveal novel types of DNA sequence similarity by comparing physico-chemical properties. Thirty-eight different parameters of DNA double strands such as charge, melting enthalpy, conformational parameters and the like are provided. As input FeatureScan requires two sequences, a pattern sequence and a target sequence, search conditions are set by selecting a specific DNA parameter and a threshold value. Search results are displayed in FASTA format and directly linked to external genome databases/browsers (ENSEMBL, NCBI, UCSC). An Internet version of FeatureScan is accessible at . As part of the HOBIT initiative () FeatureScan is also accessible as a web service at its above home page. Currently, several preloaded genomes are provided at this Internet website (Homo sapiens, Mus musculus, Rattus norvegicus and four strains of Escherichia coli) as target sequences. Standalone executables of FeatureScan are available on request

    Advanced Computational Biology Methods Identify Molecular Switches for Malignancy in an EGF Mouse Model of Liver Cancer

    Get PDF
    The molecular causes by which the epidermal growth factor receptor tyrosine kinase induces malignant transformation are largely unknown. To better understand EGFs' transforming capacity whole genome scans were applied to a transgenic mouse model of liver cancer and subjected to advanced methods of computational analysis to construct de novo gene regulatory networks based on a combination of sequence analysis and entrained graph-topological algorithms. Here we identified transcription factors, processes, key nodes and molecules to connect as yet unknown interacting partners at the level of protein-DNA interaction. Many of those could be confirmed by electromobility band shift assay at recognition sites of gene specific promoters and by western blotting of nuclear proteins. A novel cellular regulatory circuitry could therefore be proposed that connects cell cycle regulated genes with components of the EGF signaling pathway. Promoter analysis of differentially expressed genes suggested the majority of regulated transcription factors to display specificity to either the pre-tumor or the tumor state. Subsequent search for signal transduction key nodes upstream of the identified transcription factors and their targets suggested the insulin-like growth factor pathway to render the tumor cells independent of EGF receptor activity. Notably, expression of IGF2 in addition to many components of this pathway was highly upregulated in tumors. Together, we propose a switch in autocrine signaling to foster tumor growth that was initially triggered by EGF and demonstrate the knowledge gain form promoter analysis combined with upstream key node identification

    Molecular mechanistic associations of human diseases

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The study of relationships between human diseases provides new possibilities for biomedical research. Recent achievements on human genetic diseases have stimulated interest to derive methods to identify disease associations in order to gain further insight into the network of human diseases and to predict disease genes.</p> <p>Results</p> <p>Using about 10000 manually collected causal disease/gene associations, we developed a statistical approach to infer meaningful associations between human morbidities. The derived method clustered cardiometabolic and endocrine disorders, immune system-related diseases, solid tissue neoplasms and neurodegenerative pathologies into prominent disease groups. Analysis of biological functions confirmed characteristic features of corresponding disease clusters. Inference of disease associations was further employed as a starting point for prediction of disease genes. Efforts were made to underpin the validity of results by relevant literature evidence. Interestingly, many inferred disease relationships correspond to known clinical associations and comorbidities, and several predicted disease genes were subjects of therapeutic target research.</p> <p>Conclusions</p> <p>Causal molecular mechanisms present a unifying principle to derive methods for disease classification, analysis of clinical disorder associations, and prediction of disease genes. According to the definition of causal disease genes applied in this study, these results are not restricted to genetic disease/gene relationships. This may be particularly useful for the study of long-term or chronic illnesses, where pathological derangement due to environmental or as part of sequel conditions is of importance and may not be fully explained by genetic background.</p

    pcaGoPromoter - An R Package for Biological and Regulatory Interpretation of Principal Components in Genome-Wide Gene Expression Data

    Get PDF
    Analyzing data obtained from genome-wide gene expression experiments is challenging due to the quantity of variables, the need for multivariate analyses, and the demands of managing large amounts of data. Here we present the R package pcaGoPromoter, which facilitates the interpretation of genome-wide expression data and overcomes the aforementioned problems. In the first step, principal component analysis (PCA) is applied to survey any differences between experiments and possible groupings. The next step is the interpretation of the principal components with respect to both biological function and regulation by predicted transcription factor binding sites. The robustness of the results is evaluated using cross-validation, and illustrative plots of PCA scores and gene ontology terms are available. pcaGoPromoter works with any platform that uses gene symbols or Entrez IDs as probe identifiers. In addition, support for several popular Affymetrix GeneChip platforms is provided. To illustrate the features of the pcaGoPromoter package a serum stimulation experiment was performed and the genome-wide gene expression in the resulting samples was profiled using the Affymetrix Human Genome U133 Plus 2.0 chip. Array data were analyzed using pcaGoPromoter package tools, resulting in a clear separation of the experiments into three groups: controls, serum only and serum with inhibitor. Functional annotation of the axes in the PCA score plot showed the expected serum-promoted biological processes, e.g., cell cycle progression and the predicted involvement of expected transcription factors, including E2F. In addition, unexpected results, e.g., cholesterol synthesis in serum-depleted cells and NF-ÎşB activation in inhibitor treated cells, were noted. In summary, the pcaGoPromoter R package provides a collection of tools for analyzing gene expression data. These tools give an overview of the input data via PCA, functional interpretation by gene ontology terms (biological processes), and an indication of the involvement of possible transcription factors

    A correlation with exon expression approach to identify cis-regulatory elements for tissue-specific alternative splicing

    Get PDF
    Correlation of motif occurrences with gene expression intensity is an effective strategy for elucidating transcriptional cis-regulatory logic. Here we demonstrate that this approach can also identify cis-regulatory elements for alternative pre-mRNA splicing. Using data from a human exon microarray, we identified 56 cassette exons that exhibited higher transcript-normalized expression in muscle than in other normal adult tissues. Intron sequences flanking these exons were then analyzed to identify candidate regulatory motifs for muscle-specific alternative splicing. Correlation of motif parameters with gene-normalized exon expression levels was examined using linear regression and linear splines on RNA words and degenerate weight matrices, respectively. Our unbiased analysis uncovered multiple candidate regulatory motifs for muscle-specific splicing, many of which are phylogenetically conserved among vertebrate genomes. The most prominent downstream motifs were binding sites for Fox1- and CELF-related splicing factors, and a branchpoint-like element acuaac; pyrimidine-rich elements resembling PTB-binding sites were most significant in upstream introns. Intriguingly, our systematic study indicates a paucity of novel muscle-specific elements that are dominant in short proximal intronic regions. We propose that Fox and CELF proteins play major roles in enforcing the muscle-specific alternative splicing program, facilitating expression of unique isoforms of cytoskeletal proteins critical to muscle cell function

    Cohesin Proteins Promote Ribosomal RNA Production and Protein Translation in Yeast and Human Cells

    Get PDF
    Cohesin is a protein complex known for its essential role in chromosome segregation. However, cohesin and associated factors have additional functions in transcription, DNA damage repair, and chromosome condensation. The human cohesinopathy diseases are thought to stem not from defects in chromosome segregation but from gene expression. The role of cohesin in gene expression is not well understood. We used budding yeast strains bearing mutations analogous to the human cohesinopathy disease alleles under control of their native promoter to study gene expression. These mutations do not significantly affect chromosome segregation. Transcriptional profiling reveals that many targets of the transcriptional activator Gcn4 are induced in the eco1-W216G mutant background. The upregulation of Gcn4 was observed in many cohesin mutants, and this observation suggested protein translation was reduced. We demonstrate that the cohesinopathy mutations eco1-W216G and smc1-Q843Δ are associated with defects in ribosome biogenesis and a reduction in the actively translating fraction of ribosomes, eiF2α-phosphorylation, and 35S-methionine incorporation, all of which indicate a deficit in protein translation. Metabolic labeling shows that the eco1-W216G and smc1-Q843Δ mutants produce less ribosomal RNA, which is expected to constrain ribosome biogenesis. Further analysis shows that the production of rRNA from an individual repeat is reduced while copy number remains unchanged. Similar defects in rRNA production and protein translation are observed in a human Roberts syndrome cell line. In addition, cohesion is defective specifically at the rDNA locus in the eco1-W216G mutant, as has been previously reported for Roberts syndrome. Collectively, our data suggest that cohesin proteins normally facilitate production of ribosomal RNA and protein translation, and this is one way they can influence gene expression. Reduced translational capacity could contribute to the human cohesinopathies

    Data on master regulators and transcription factor binding sites found by upstream analysis of multi-omics data on methotrexate resistance of colon cancer

    No full text
    Computational analysis of master regulators through the search for transcription factor binding sites followed by analysis of signal transduction networks of a cell is a new approach of causal analysis of multi-omics data. This paper contains results on analysis of multi-omics data that include transcriptomics, proteomics and epigenomics data of methotrexate (MTX) resistant colon cancer cell line. The data were used for analysis of mechanisms of resistance and for prediction of potential drug targets and promising compounds for reverting the MTX resistance of these cancer cells. We present all results of the analysis including the lists of identified transcription factors and their binding sites in genome and the list of predicted master regulators – potential drug targets. This data was generated in the study recently published in the article “Multi-omics “Upstream Analysis” of regulatory genomic regions helps identifying targets against methotrexate resistance of colon cancer” (Kel et al., 2016) [4]. These data are of interest for researchers from the field of multi-omics data analysis and for biologists who are interested in identification of novel drug targets against NTX resistance

    TRANSCompel(®): a database on composite regulatory elements in eukaryotic genes

    Get PDF
    Originating from COMPEL, the TRANSCompel(®) database emphasizes the key role of specific interactions between transcription factors binding to their target sites providing specific features of gene regulation in a particular cellular content. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor–DNA and factor–factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. Each database entry corresponds to an individual CE within a particular gene and contains information about two binding sites, two corresponding transcription factors and experiments confirming cooperative action between transcription factors. The COMPEL database, equipped with the search and browse tools, is available at http://www.gene-regulation.com/pub/databases.html#transcompel. Moreover, we have developed the program CATCH™ for searching potential CEs in DNA sequences. It is freely available as CompelPatternSearch at http://compel.bionet.nsc.ru/FunSite/CompelPatternSearch.html
    corecore